Structuring and sampling complex conformation space: Weighted ensemble dynamics 

simulations 
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Based on multiple simulation trajectories, which started from dispersively selected initial con- 
formations, the weighted ensemble dynamics method is designed to robustly and systematically 
explore the hierarchical structure of complex conformational space through the spectral analysis of 
the variance-covariance matrix of trajectory-mapped vectors. Non-degenerate ground state of the 
matrix directly predicts the ergodicity of simulation data. The ground state could be adopted as 
statistical weights of trajectories to correctly reconstruct the equilibrium properties, even though 
each trajectory only explores part of the conformational space. Otherwise, the degree of degeneracy 
simply gives the number of meta-stable states of the system under the time scale of individual trajec- 
tory. Manipulation on the eigenvectors leads to the classification of trajectories into non-transition 
ones within the states and transition ones between them. The transition states may also be pre- 
dicted without a priori knowledge of the system. We demonstrate the application of the general 
method both to the system with a one-dimensional glassy potential and with the one of alanine 
dipeptide in explicit solvent. 

PACS numbers: 02.70.Ns, 87.15.A-, 82.20.Wt 



I. INTRODUCTION 



To explore the conformational space of physical sys- 
tems, standard molecular-dynamics (MD) and Monte 
Carlo (MC) simulation methods have been applied for 
decades. All the dynamic and thermodynamic informa- 
tion of a model system could be extracted from a long 
enough, ergodic simulation trajectory. For complex sys- 
tems with a rugged free-energy surface, due to the exis- 
tence of lots of free energy barriers, the standard methods 
are no longer promising for enough sampling, which - as 
a result - impulses the development of various advanced 
simulation techniques [l|, [3, H, 0, d, 0, 0, Hi- Despite 
the success already achieved, it is still difficult to thor- 
oughly investigate a practically interesting system by a 
single trajectory simulation with these advanced algo- 
rithms. Besides, to ensure the convergence of simulation, 
a few trajectories may need to be generated separately, 
and then the difference between these trajectories could 
be measured @ to sentence the ergodicity of simulation 
under the simulated time scale. Thus, it is reasonable 
to ask whether we could benefit from the state-of-the-art 
distributed computation technique to explore the confor- 
mational space with multiple parallel generated simula- 
tion trajectories. Actually, one solution, e.g., the ensem- 
ble dynamics (ED) method El, has already been 
well established to investigate the transition dynamics 
between specific states with thousands of short simula- 
tion trajectories With a similar technique, but dif- 
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ferent spirit, we develop a method for systematically dis- 
covering the meta-stable states in conformational space 
and the transition events between them. It could also be 
applied to more efficiently sample the complex conforma- 
tional space. 

The method is bases on three considerations. (1) Com- 
pared to the single simulation trajectory, parallel gen- 
erated trajectories, which started from dispersive initial 
conformations, are possible to cover a much larger area 
in conformational space with the same total simulation 
length. (2) Ergodicity broken simulation also provides 
meaningful information of conformational space. Exist- 
ing methods for constructing the hierarchical state struc- 
ture usually demand thoroughly a sampling of the system 
either energetically 13] or kinetically tM, il5i, JJ3, 17, 18]. 
As a matter of fact, the state structure is an effect of sim- 
ulation time scale. With long enough observation time, 
the whole conformational space seems like one state. 
Otherwise, the sub-states structure of the system will 
emerge. Thus, even though each trajectory in ED is not 
ergodic, their mutual relation will reflect the state struc- 
ture of the system under the time scale of the individ- 
ual simulation trajectory. (3) The short ED trajectories 
could be combined to give out equilibrium properties of 
the system. Weighted histogram analysis method [l^ 
is one existing technique to combine several trajectories 
by estimating the density of states in energy space. In 
ED, since every trajectory is generated by the same MD 
(or MC) simulation, we could simply specify a statistical 
weight to each trajectory to reconstruct the equilibrium 
distribution. The weighting scheme leads to the name of 
our method, weighted ensemble dynamics (WED). 

Given multiple MD trajectories generated by the same 
algorithm and condition, WED provides a way to system- 
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atically understand the information in existing data with- 
out requiring much foreknowledge. It classifies the tra- 
jectories into non-transition ones within the meta-stable 
conformational regions (states), and transition ones be- 
tween these states. The trajectories inside the same state 
could be combined to mimic the intrastate equilibrium 
distribution, and some of the transition trajectories could 
be shortened to locate the well-defined transition state 
ensemble. Based on the existing data, further exploration 
of the system could be efficiently guided by WED, and 
the temporally hierarchical structure of the conforma- 
tional space could be distilled out step by step. 

The paper is organized as follows. The theory of WED 
will be introduced in Sec. HH The results and discussions 
are presented in Sec. IIIII A short conclusion is given 
in Sec. |lVl Finally, the detailed simulation methods are 
listed in the Appendix. 



II. THEORY 

Let us consider a set of r— length, normally simulated 
MD (or MC) trajectories. The distribution of their initial 
conformations Pmitio) will evolve along the trajectories 
to the equilibrium distribution Pgq (q) with long enough r. 
Here q generally denotes the conformation coordinates of 
the system. The time for approaching equilibrium is de- 
pendent on the deviation of Pmitio) from Peq(q)- Instead 
of waiting for the final equilibration, we may specify each 
trajectory a statistical weight. Then, the equilibrium en- 
semble average of any physical quantity A{q) could be 
estimated by 



{Ameq = 



(1) 



where, Wi is the weight of the ith trajectory. ( . . .)eg and 
{■ ■ ■)i denote the average over the equilibrium distribu- 
tion and the ith trajectory, respectively. 

Considering the deviation of Pinit{<l) from Peq{q), we 
could choose 



init^eq 



Pinitiqio) 



(2) 



as Wi- Here (f^g is the initial conformation of the ith tra- 
jectory. For example, if the initial conformations were 
chosen from the simulation with modified potential U{q)^ 
we could set Wi oc exp[/3/7(^io) ~ /^^^(fto)] to reproduce 
the equilibrium properties of the original system with the 
potential V{q). However, this normal weighting scheme 
demands the knowledge of Pinit{q), and usually suffers 
from the enormous fluctuation of {^init,eqiqio)} in sys- 
tems with large degrees of freedom. As a result, only a 
few trajectories with dominating Wi could significantly 
contribute to the weighted average. 

Actually, while the length of simulation trajectories, 
r, is not very short, the conformational space could be 
divided into a few meta-stable regions, wherein a single 



trajectory could reach local equilibrium within r. Thus, 
for trajectories that started from the same meta-stable 
region, the specified weights - although dependent on the 
initial conformations - should be approximately identical 
when reproducing the equilibrium properties. Therefore, 
we could write 



Wi oc 



j^^^^Pinit{q)dq 



(3) 



where a{i) denotes the meta-stable region, in which the 
ith trajectory is started. Instead of exphcitly using 
Eq. ([3]), we simply construct a new ensemble of conforma- 
tions —A"— in practice. X is constituted by the confor- 
mations in the initial t time length of all the simulation 
trajectories. Within short enough t (compared to r), 
each trajectory is supposed to be still exploring the same 
meta-stable conformational region, leading to the quite 
plausible conclusion that Px{q)dq « Pinit{q)dq for 
all the meta-stable regions. Here, a denotes the meta- 
stable region in consideration; Px{q) denotes the distri- 
bution function of the conformations in X (all the distri- 
bution functions are supposed to be normalized). Con- 
sequently, Px{q), instead of Pinitiq), could be used to 
estimate the trajectory weights as follows, 



{^X,eq{q))i+ = 



Peqjq) 

Px{q) 



(4) 



Here, {■■■)i+ denotes the average over the initial i- 
length segment of the ith trajectory, i.e.. 



{A{q)),+ [ Am'))dt\ 

t Jo 



(5) 



for any conformational function A{q). Here qi{t') de- 
notes the conformation at t' time in the ith trajectory. 
Heuristically, by averaging ilx^eqiq) over the initial short 
segment of each trajectory, we are calculating {wi} ac- 
cording to the initial regions of the trajectories, rather 
than solely by the initial conformations. 

Although the analytical expression of Px{q) is un- 
known, considering the general relation 



{nx,eq{q)Amx - {A{q)) 



eq 



(6) 



for any conformational function A{q)^ we could linearly 
expand U,x,eq{q) with a complete set of conformational 
functions (also referred to as basis functions in the fol- 
lowing) {A^^{q)}, as 

nx,eq{q) = 1 + 5]5M.(^^)('5xA^(<r))e, SxA'^iq), (7) 

where, 5xA^'{q) = A^{q) ~ {A'^mx and g^,,{Px) is 
the inverse of the variance-covariance matrix, g^'^{Px), 
which is defined as {Sx A^ {q)Sx A" (q)) x with {■ • ■)x de- 
noting the average over X samples. In Eq. ([7]), the equi- 
librium ensemble average {6xA^{q))eq actually relies on 
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the trajectory weights {wi} through Eq. ([T]). Thus, sub- 
stituting Eq. ([7]) into Eq. ^ gives out the following self- 
consistent linear equations of {wi}, 

Wi^l + '^T.jWj, (8) 

j 

where T,, = ^E,.,.9f.APx){SxAnq)),^{SxA-^{q)),, 
i, j = 1, ■ ■ ■ ,p. p is the number of trajectories, and we 
already set ^ - Wi = p. All the parameters in Eq. ([5]) 
could be calculated simply by averaging basis functions 
over simulation trajectories. Since the current formula- 
tion is exempted from any information about Pmit i<l) , 
initial conformations could be arbitrarily selected from 
different sources, such as coarse-grained simulation, high- 
temperature simulation, experimental knowledge, or even 
theoretical conjecture, then the conformational space 
could be more efficiently traversed and structured. 

Equation ([5]) could be written as G* ■ w = 0, {i = 
1, . . . ,p), where G* — {Gn, . . . , Gip)'^ is the vector with 
components G^ = Tij — % + ^, {hj = ■ ■ ,p), and 
Sij is the Kronecker delta symbol, w — {wi, . . . , Wp)'^ is 
the vector of trajectory weights, which could be thought 
as the normal vector of the hyper-plane spanned by the 
trajectory-mapped vectors {G*}. The equations could be 
further adjusted to the following form, 

Hw = 0, (9) 

by minimizing the residual / = ' ^ ~ 0)^. Here 

H — G^G is the covariance matrix of the vectors {G*}. 
Compared to Eq. ([5]) , Eq. ^ is easier for analytical treat- 
ment due to the symmetric form of H . It is also more 
general in application. For example, when multiple tra- 
jectories are generated from one initial conformation for 
better statistics, thus the number of trajectories is larger 
than the number of independent weig hts; Eq. jH) still 
works. 

There are a few key points in the WED method. {\) H 
is a positive semi-definite matrix with at least one zero 
eigenvalue. If the conformational space is well connected 
by simulation trajectories, the ground state of H will be 
non-degenerate, which uniquely determines the trajec- 
tory weights. Otherwise, if the simulation trajectories are 
isolated in different conformational regions by large free 
energy barriers, H will have degenerate ground state (i.e. 
multiple zero eigenvalues), leaving the relative weights 
between trajectories in different regions indeterminable. 
When a few trajectories connecting the different regions 
exist, the zero eigenvalues of H will be perturbed to small 
nonzero values. The small-eigenvalue eigenvectors of H 
could be manipulated to extract the information of con- 
formational states and transitions between them. This 
data mining process is usually performed by projecting 
{&} to the small- eigenvalue eigenvectors. Since {&} 
one to one corresponds to the simulation trajectories, 
the projection effectively maps the trajectories into low- 
dimensional space for subsequent classification. In the 



following, we align the eigenvectors of H by their eigen- 
values with an ascending order, e.g.^ the first eigenvector 
is the one corresponding to the smallest eigenvalue of H . 
The larger-eigenvalue eigenvectors mainly reflect the in- 
trastate statistical fluctuation among trajectories. 

(2) Although the expansion in Eq. ([7]) is exact if and 
only if the set of conformational functions is complete, it 
is not necessary to include too many basis functions in 
WED. This is because only the mean values of Q.x,eq{(t} 
over a large number of conformation samples, instead of 
the values of ^x,eq{<l) for different conformations, are re- 
quired in the construction of the linear equations of {wi\ 
[Eq. ([5]) and In practice, only physically relevant 
and important quantities of the simulation system need 
to be selected as basis functions to distinguish conforma- 
tional meta-stable states. The selection does not demand 
much foreknowledge. For biological macromolecules, the 
important inner coordinates, such as dihedral angles and 
pair distances, could be chosen as basis functions to char- 
acterize the conformational motion. Various physical 
quantities, such as the potential energy of the system, 
solute-solvent interactions, could also be included for the 
searching of related kinetic or thermodynamic phenom- 
ena. The variance-covariance matrix g*^'^ will ensure the 
consistent consideration of different classes of basis func- 
tions, and provide the measure of their relative impor- 
tance (after orthogonalizing the basis functions based on 

(3) There is only one free parameter in WED method, 
t* — t/r^ for collecting the X samples. Large t* may 
bring systematical error to the estimated equilibrium 
properties, and small t* will reduce the sample num- 
ber in X, leading to a larger statistical uncertainty. 
t* = 0.01 ~ 0.1 is usually applied in the current work 
with satisfiable results obtained. We also point out that, 
for the purpose of discovering the states in conforma- 
tional space, the results are not very sensitive to t* . 



III. RESULTS AND DISCUSSIONS 
A. System with one-dimensional glassy potential 

We first illustrate the WED method in a one- 
dimensional system with glassy potential. There are four 
major potential wells respectively located around the po- 
sitions — 1.25, —0.25, 0.75, and 1.75 in this system [see the 
inset of Fig. [Ijb)]. We name these potential wells with 
the positions of their minima in the following. For each 
WED simulation, 400 trajectories arc generated with the 
initial conformations randomly selected in the interval of 
[—2.0,2.0]. The system is investigated under five tem- 
peratures of 0.3, 0.6, 0.7, 1.1, and 2.0. More details are 
shown in the Appendix. 

The simulation trajectories and the ten smallest eigen- 
values of H for WED analyses under different temper- 
atures are shown in Fig. [T] At T = 0.3, there is 
no simulation trajectory connecting the different poten- 
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tial wells. As temperature increases, transition events 
emerge, and the total number of transition trajectories 
increases fast [see Fig. [Tfa)]. Correspondingly, four zero 
(or almost zero) eigenvalues of H are found at T = 0.3 
[see Fig. [TJb)]. As temperature increases, the fourth 
smallest eigenvalue begins to deviate from zero due to 
the transition trajectories between potential wells —1.25 
and —0.25. At T = 0.7, due to the several transition 
trajectories between potential wells —0.25 and 0.75, the 
third eigenvalue is also lifted to small positive value. Sub- 
sequently at T = 1.1, except for the smallest one, all the 
other eigenvalues are prominently deviating from zero. 
However, since only limited fraction of trajectories can 
pass the highest energy barrier in the system, the second 
smallest eigenvalue is still relatively small (about 0.1). 
Finally at T = 2.0, there is only one near-zero eigenvalue 
left, indicating the ergodicity of the simulation data as a 
whole. 

For WED simulation at T = 1.1, since there is only 
one zero eigenvalue of H, we could uniquely determine 
the trajectory weights to reproduce the equilibrium dis- 
tribution (or energy curve) of the system. Although, only 
part of the WED trajectories are found to be able to pass 
the highest potential barrier (either once or more), thus, 
single trajectory is far from ergodic, the energy curve of 
this system is correctly reproduced by re-weighting the 
simulation trajectories [see Fig.[2l^a)]. For further testing, 
we build a 150-trajectory subset from all the 400 trajecto- 
ries, so that the distribution of the initial conformations 
of the 150 trajectories is very different from that of all the 
400 trajectories. In practice, the subset involves all the 
75 trajectories initially inside the 1.75 potential well and 
75 ones that are randomly chosen from the remaining 325 
trajectories initially outside the 1.75 potential well. With 
the subset of trajectories, we reconstruct Eq. ([9]), recal- 
culate the trajectory weights, and use the weights and 
the 150 trajectories to reproduce the equilibrium distri- 
bution of the system. The energy curve is again correctly 
predicted as shown in Fig. ^a.) . More intuition could be 
perceived from Fig. (H^b), where the trajectory weights 
for both the original 400-trajectory data set and the 150- 
trajectory subset are shown. For the 150-trajectory data 
set, the fraction of conformations outside the 1.75 po- 
tential well is reduced a lot, thus, the trajectories which 
started outside the 1.75 potential well are specified larger 
weights in response. The changing of trajectory weights 
offsets the changing of the initial distribution of the tra- 
jectories, leading to the same reconstructed energy curve. 
Besides, the weight of a trajectory is found to be mainly 
dependent on the potential well from where the trajec- 
tory is started, rather than the initial conformation of the 
trajectory, consistent with our supposition in the deriva- 
tion of Eq. ©. 

At lower temperature of T = 0.6, the ground state 
of H is degenerate. We project G* to the second, third 
and fourth eigenvectors of H, which maps each trajec- 
tory i to the point {L2, L\) in a three-dimensional 
space. Here LJ^ = G* ■ Ua, is the ath eigenvector 



of H, and L\ is always zero. In the three-dimensional 
space, the 400 trajectories could be classified as shown 
in Fig. [TJc). There are four highly concentrated groups, 
respectively, with 77, 92, 37 and 66 points of almost the 
same coordinates. The points in these groups are verified 
to represent the non-transition trajectories in the poten- 
tial wells 1.75, 0.75, —0.25, and —1.25, respectively. The 
119 points located along the line connecting —1.25 and 
—0.25 correspond to transition trajectories between po- 
tential wells —1.25 and —0.25, the other 9 points located 
in the plane of —1.25, —0.25, and 0.75 correspond to that 
among the wells —1.25, —0.25, and 0.75. 

With the classification of trajectories, the equilibrium 
properties in each potential well could be estimated by 
the non-transition trajectories inside. Moreover, consid- 
ering the 119 transition trajectories between potential 
wells —1.25 and —0.25, it is possible to further extract 
more detailed information of the super-state containing 
these two states. We truncate the ending segment of each 
trajectory and repeat the analysis (i.e. reconstructing H, 
analyzing its spectral properties etc.). For the trajecto- 
ries in the super-state containing —1.25 and —0.25, we 
plot the {L\} calculated with truncated trajectories (at 
0.8r) versus those with the full trajectories in Fig.[n{a). 
In this figure, a one-to-one correspondence between the 
data points and the simulation trajectories exists, which 
helps to further classify the trajectories as follows, (i) 
The data points could be approximately separated into 
two groups. Those in the same group correspond to the 
trajectories which started from the same state (either 
— 1.25 or —0.25). (ii) The data points with extremal L\ 
values along both axes correspond to the non-transition 
trajectories. No matter whether truncated or not, these 
trajectories are always non-transition ones, and should 
always be similar to each other in L\ value, (iii) Except 
for the points corresponding to the non-transition tra- 
jectories, the other points in the dash-dotted horizontal 
lines in Fig. ^a) correspond to the transition trajecto- 
ries where transition does not happen before 0.8t. These 
trajectories become non-transition ones only after trun- 
cation and should have similar truncation-calculated L\ 
values. Besides, we also find that, (iv) the data points in 
the dashed-inclined lines correspond to trajectories with 
an odd number of transitions which all happen within 
0.8r. (v) The ones in the dotted-inclined lines correspond 
to trajectories with even number of transitions which all 
happen within O.Sr. (vi) All the other points outside the 
straight lines in Fig. [3)Ja) correspond to the trajectories 
with early transitions occurred within t, and the multi- 
ple transition trajectories with transition happened both 
within O.Sr and after O.Sr. Noticing that the truncation 
of trajectories actually adjusts the occupation fraction of 
the trajectories in different states, the observed alloca- 
tion from (iv) to (vi) could be explained by supposing 
the existence of the linear relation between {L\} and the 
occupation fraction of trajectories in different states. 

Since there are considerable fraction of non-transition 
trajectories within r, we could safely assume that almost 
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all the odd-transition trajectories are actually single- 
transition trajectories. Thus, their transition time should 
be linearly dependent on their occupation fraction in dif- 
ferent states. Therefore, the linear relation between tran- 
sition time and L\ value should exist for these putative 
single-transition trajectories. To predict the linear rela- 
tion, there are two key points in Fig.[3]Ja), corresponding 
to the two intersection points between dash-dotted hori- 
zontal lines and dashed-inclined lines. These two points 
correspond to single-transition trajectories, which started 
from different potential wells (—1.25 and —0.25), and 
happen transition at 0.8t. By truncating the trajecto- 
ries to different lengths {e.g., 0.5r, 0.6t, 0.7t, 0.8t and 
0.9r), and reproducing the analogous figures to Fig. [3^, 
a few pairs of key points could be collected. Two sets of 
linear relations between transition time and L\ value for 
the supposed single-transition trajectories could be de- 
termined, which apply, respectively, to the — 1.25-started 
trajectories (trajectories started from the —1.25 potential 
well) and the — 0.25-started trajectories. 

For all the transition trajectories in the dashed-inclined 
lines and dash-dotted horizontal lines in Fig. [3ja), 0.02r 
trajectory segments centered by the predicted transition 
times are shown in Fig. [31(b). Except for one trajec- 
tory, which is non-transition in O.Sr and happens twice 
transitions after O.Sr, the other trajectories all pass the 
potential barrier between —1.25 and —0.25 within the 
predicted time windows. We could expect that for sys- 
tems with well-defined transition states, it is always pos- 
sible to efficiently shorten the transition trajectories to 
locate the transition state ensemble. Similar example for 
the solvated alanine dipeptide system will be shown in 
Secinill 



B. System of solvated alanine dipeptide 

The alanine dipeptide molecule is shown in Fig. |4] (left 
panel). There are only two important main chain dihe- 
dral angles 4> and in this system. The 2 2- atom molecule 
is solvated in 522 TIP3P waters. 500 conformations are 
first collected from a 10ns simulation of the system at 
T = 600K. The projections of these conformations to 
the (j) — plane are shown in Fig. U] (right panel) . These 
conformations are mainly located in the three free-energy 
wells on the 4> — ip plane, i.e., C^'^ and an with negative 
(j) value (488 in 500) and C"^ with a positive (f) value 
(12 in 500). C^^ is less stable compared to Cy' and ur. 
Starting from each of these conformations, the system is 
simulated for 300ps (or 600ps). More details are shown 
in the Appendix. 

There exists only one zero eigenvalue of H under both 
temperatures of T = 450K and T = 300K [see Fig. EJi], 
suggesting that the equilibrium properties of the sys- 
tem could be reproduced with current simulation data. 
The trajectory weights estimated by WED are shown in 
Fig. EKb). At T = 450K, the weights of the Cy'^-started 
trajectories (trajectories started from C^"^) have similar 



mean value and fluctuation with those of the a^j-started 
trajectories. In contrast, the weights of the C^^-started 
trajectories are partially depressed to smaller values. 

At T = 300K, the weights of the C^'-started trajec- 
tories, as a whole, are slightly smaller than those of the 
a_R-started trajectories [see Fig. [SJd, right panel]. With 
current simulation data, the life time of C^' and an are 
estimated to be 30.5 and 25.2 ps at T = 300K (the cor- 
responding values are 9.56 and 10.3ps at T = 450K). 
Consequently, some of the 300ps trajectories may not be 
able to reach equilibrium between these two states. Thus, 
to reconstruct the equilibrium distribution in the region 
containing Cy' and a/j, it is inevitable to specify diversi- 
fied weights to different trajectories, which explains the 
slight difference in trajectory weights between the C^''- 
started and the ai^-started trajectories. In comparison 
with T — 450K, the weights of the Cy'^-started trajec- 
tories arc further depressed in response to the further 
instability of Cy^ at lower temperature. 

Adding potential barriers onto the standard dihedral 
energy terms of (p and ip, we re-simulate 500 trajectories 
with T — 600ps and analyze these trajectories with WED 
method. These potential barriers will kinetically further 
separate the three free-energy wells of C^', an, and Cf^ 
(see the Appendix for more details). As a result, two zero 
(or near zero) eigenvalues of H are found [see Fig. [Slja)], 
predicting two groups of simulation trajectories secluded 
in different free-energy wells. Consistently, the projec- 
tion values of {&} to the second eigenvector of H {i.e., 
{L|}) classify all the 500 trajectories into two groups (see 
Fig. El left panel). While 12 trajectories in one group are 
located inside Cy^, the remaining 488 trajectories could 
be further classified based on their Lg values, considering 
the small value of the third eigenvalue of H. These 488 
trajectories are identified as non-transition trajectories 
in the two states, C^' and afj, and transition trajecto- 
ries between them. In the L\-L\ plane, they lie along 
a straight line almost parallel to the L\ axis, with the 
non-transition trajectories gathering at the two termi- 
nals of the line (with extremal L\ values, see Fig.jH right 
panel). By independently analyzing the 488 trajectories, 
the weights of these trajectories could be obtained to 
construct the equilibrium distribution of the super-state 
containing Cy"^ and a^. The resulting free energy profile 
along is shown in Fig. [5jc) . We randomly throw away 
half of the trajectories that started from an {e.g., 132 
of 264 trajectories), and redo the WED analysis for the 
remaining 356 trajectories. Although the conformations 
in an have been (approximately) reduced by half, the es- 
timated weights of the remaining afl-started trajectories 
are indeed almost doubled relative to the previous val- 
ues. Consequently, the reconstructed free energy profile 
closely matches the previous one, suggesting the robust- 
ness of WED method. 

Similar to the one-dimensional system with glassy po- 
tential, for the 488 trajectories in Cy' and an potential 
wells, we also find the linear relation between the L\ value 
and the occupation fraction of trajectories. Analogous 
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figure to Fig. [Sja) is plotted as Fig. [Tlja). The predicted 
transition times for the putative single-transition trajec- 
tories [trajectories with their representing points located 
on the dashed inclined lines in Fig. El^a)] are in great 
agreement with their real transition times directly iden- 
tified along the simulation trajectories [see Figs.[7l^b) and 
[TJc)] . The fraction of the trajectories which happen tran- 
sition within the 6 ps (12 ps) time windows centered by 
their predicted transition times, reaches 75% (93%) per- 
cent. For illustration, the predicted 6ps segments of three 
trajectories are shown in Fig. [7{d). 



IV. CONCLUSION 



We present the WED scheme for systematically explor- 
ing the kinetic state structure of conformational space. 
WED works by automatically discovering the mutual re- 
lation between parallel generated trajectories. It could 
also combine partially overlapped trajectories in confor- 
mational space to estimate the equilibrium properties of 
complex systems with rugged potential-energy surface. 
The method works well without existing experimental or 
simulation knowledge of the system, and may not suffer 
much from increasing system dimension by only applying 
relevant physical quantities as basis functions. Exempted 
from the knowledge of the initial conformational distribu- 
tion, WED provides flexibility to choose as dispersive as 
possible start points of trajectories in the whole confor- 
mational space. For example, initial conformations could 
come from simulations at different temperatures, as well 
as theoretically and experimentally important conforma- 
tions {e.g., completely or partially folded structures of 
proteins). After arbitrarily adding back the missing de- 
grees of freedom and short relaxation, it is also possible to 
adopt coarse-grained simulation conformations as start- 
ing points of WED simulation. Since the detailed sub- 
states in shorter time scales could be detected by sequen- 
tially chopping the trajectories and repeating our anal- 
ysis, the hierarchical state structure of the free-energy 
landscape [l^ could be distilled out up to the total sim- 
ulation time scale. Finally, for slow transition dynamics, 
which is impractical to be realized by simply increas- 
ing trajectory number, the combination of WED with 
current techniques for studying two-point slow dynam- 
ics d, 0, [1| should be interesting. 
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V. APPENDIXES 

A. Simulation and analysis method for 
one-dimension system 

For one-dimension system with coordinate x and po- 
tential function U{x), the over-dampened Langevin equa- 
tion 



dx 
IE 




(10) 



is adopted to generate the dynamical trajectories of x. 
Where 7 is the frictional coefficient, fes is the Boltzmann 
constant, T is the simulation temperature, and ^{t) is the 
white noise satisfying {^{t)^{t')) = 5{t — t') with () denot- 
ing ensemble average of noise. We simply take ks and 7 
as unity to get the dimension-reduced units for time, po- 
sition and temperature. Reflecting boundary condition 
is assigned for all the simulations. 

The analytical expression of one-dimension glassy po- 
tential is as following 
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For ED simulations, 400 initial positions are randomly 
selected in [—2.0, 2.0] interval. Each trajectory is sim- 
ulated for 50 time length, and the frames are recorded 
every 0.01 time length. In analysis, t* = 0.02 is chosen, 
corresponding to 40000 conformations in the X sample. 
The one-dimension trigonometrical functions 



cos(— — ),sm(— -) TO, n= 1,2, 3. 



(11) 



are selected as basis functions in analysis. The first 20 
basis functions, e.g. to, n both from 1 to 10, are in- 
cluded in calculation. Similar results could be obtained 
with 10 basis functions, with m and n both from 1 to 
5. To reconstruct the energy curve by WED method, 
a 40-bin histogram is generated. The theoretical curve 
is calculated by integrating the theoretical equilibrium 
distribution inside each bin. 



B. Simulation and analysis method for alanine 
dipeptide with explicit solvent 

The alanine dipeptide, a 22-atom small peptide 
molecule, is solvated in 522 TIP3P water molecules with 
totally 1588 atoms in the system. All the simulations 
are performed under NVT ensemble (constant parti- 
cle number, constant volume and constant temperature) 
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with NAMD simulation package [20l| and Charmm27 force 
field. The Langevin thermostat is chosen to keep the 
temperature of the system, with a damping coefficient of 
5ps~^. Periodic boundary condition is imposed. The sys- 
tem has a box size of 25.9A x 24.5A x 27.8A, and PME 
method is applied to calculate the electrostatic energy 
with Particle Mesh Ewald PME grid size chosen as 32 
along all the directions. Van de Waals interaction is cut 
off at 12 A and switched to zero from lOA. In the WED 
simulations, the first Ips simulation for each trajectory 
is taken as relaxation, then each trajectory is simulated 
for 300ps with conformations recorded every 0.5ps (BOOK 
and 450K simulation with standard force field) or 600ps 
with conformations recorded every 0.2ps (300K simula- 
tion with modified force field). 

To mimic a disconnected free energy landscape under 
300K temperature for this system, we added the following 
style of boundary potential on the Charmm27 force field 
for selected dihedral angles. 



the physical basis functions are basically the same. The 
12 basis functions of one and two order trigonometrical 
functions could already reproduce the results in current 
paper. The t* value of 0.05 is adopted to analyze the sim- 
ulation data at 450K and BOOK with standard force field. 
The one of 0.02 is adopted to analyze the simulation data 
at BOOK with modified force field. To reconstruct the free 
energy surface, the interval of [—180, 180], either for or 
"0, is divided into B6 bins for histogram construction. 



f{X,Xlow,Xhigh, ^XiEa) 



cos(x-Xlo.i;)-cos(Ax) rp 
l-cos(Ax) 
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Xlow - Ax < X <Xi ow 

Xtow X ^ Xhigh 

Xhigh < X < Xhigh - Ax 

Xhtgh < X < Xlow + B60 



Where, x is the degree of freedom on which the mod- 
ified potential acts, xiow and Xhigk are the lower and 
upper boundary for the potential function. Within the 
boundary, the potential function is identically equal to 
Eq. Out of the boundary, the potential function will 
reduce to zero within Ax length. The angles are mea- 
sured with degree, and the energy unit is Kcal/mol. 
Three potential functions, /(V', —155, —150, 10, 8.5), 
0,10, 15, 12) and /(</>, 130, 140, 15, 4) are added by 
NAMD simulation package to modify the free energy sur- 
face. f{ip, —155, —150, 10, 8.5) adjusts one of the two free 
energy barriers between C^'' and aji, lifting the one at 
— 150 degree of V' angle to approximately the same height 
as the one at 25 degree. The other two free energy func- 
tions are used to separate Cy^ with the other free energy 
wells in conformational space. 

In the WED analysis, we first choose physically impor- 
tant quantities as basis functions. The selected physical 
quantities include the potential energy of the whole sys- 
tem, the square of potential energy, the sum of all dihe- 
dral energies in alanine dipeptide, the sum of all bonded 
energies (bond, angle and dihedral energies) in alanine 
dipeptide. The physically selected basis functions are 
then complemented by two dimensional trigonometrical 
functions of dihedral angles and ^p. These functions are 

sin [(m + n)(j)\, cos[(to + n)(/)], to + n > 
sin (m(f)) sin (wtp), sin (to0) cos (mp), cos (mi/)) sin (wtp), cos (mcj)) cos {nip), m > l,n ^lE) 

We define the summation of m and n in equation p2p as 
the order of these functions, and adopt the one to four or- 
der functions in our analysis. Together with the physical 
quantities, there are 44 basis functions included in cal- 
culation. However, the results obtained with or without 
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FIG. 1: (Color online) Trajectories, eigenvalues of H and pro- 
jection map of one- dimensional system. (A) Trajectories sim- 
ulated at four different temperatures with a one-dimension 
glassy potential. All 400 trajectories are plotted except for 
T — 2.0, where only 20 randomly selected ones are plot- 
ted. The horizontal axis denotes the reduced simulation time 
with the full simulation time scaled to 1.0. The vertical axis 
denotes the coordinate of one- dimensional system. (B) The 
first ten eigenvalues of H at five temperatures. The one- 
dimensional glassy potential is also shown as inset. (C) The 
projection of {G*} to the second, third and fourth eigenvec- 
tors of H {T = 0.6). The points (red) representing non- 
transition trajectories in the four major potential wells 1.75, 
0.75, —0.25 and —1.25 are gathering, respectively, inside the 
four regions enclosed by dashed-edge squares. The correspon- 
dence between states and regions is labeled in the graph. The 
other points (cyan) represent the transition trajectories be- 
tween states. See the main text for more details. 



9 



A 




FIG. 2; (Color online) Reconstructed energy curve and the 
trajectory weights of one- dimensional system. (A) The energy 
curves reconstructed with 150-trajectory (black upward tri- 
angles) and 400-trajectory (red downward triangles) data set 
are shown together with the theoretical curve (blue solid line) . 
The three lines almost overlap with each other. (B) Trajec- 
tory weights for 150-trajectory (black upward triangles) and 
400-trajectory data set (red downward triangles) are shown. 
The horizontal axis denotes the initial position of trajecto- 
ries. The weights has been scaled to ensure the equal average 
weight of trajectories from 1.75 potential well for the two data 
sets. The potential energy is also shown as solid line. 
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FIG. 3: (Color online) Trajectory identification and shorten- 
ing for locating transition state ensemble in one-dimensional 
system. (A) {L\} calculated with truncated trajectories (at 
0.8r) versus those with the full trajectories. Only data for 
trajectories which started from the potential wells —1.25 and 
—0.25 are shown. The non-transition trajectories in —1.25 
and —0.25 are shown as upward and downward triangles re- 
spectively, and the transition trajectories which started from 
the two states are plotted as (blue) circles and (red) squares, 
respectively. The straight lines are plotted by hand, and the 
arrows indicate the intersection points between dash-dotted 
horizontal lines and dashed-inclined lines. The inset illustrate 
the truncation of one simulation trajectory at the normalized 
time t/r = 0.8. (B) For the supposed single-transition tra- 
jectories, short (0.02t) trajectory segments centered by the 
predicted transition times are shown. The upper panel shows 
the predicted transitions from the potential wells —1.25 to 
—0.25; the lower panel shows the transitions predicted to be 
in the reversed direction. The horizontal axes denote the re- 
duced simulation time with the full simulation time scaled to 
1.0. 




FIG. 4: (Color online) Illustration of alanine dipeptide 
molecule and the selected initial conformations from 600K 
simulation. In the left panel, the alanine dipeptide molecule 
is shown with the two important dihedral angles, (j> and t/j, 
labeled. In the right panel, the 500 initial conformations of 
WED simulation are projected onto the (f>-^ plane (red stars), 
taking their dihedral angles as coordinates. The background 
is the free energy surface on (j)-ip plane reconstructed with the 
WED analysis of 450K data. 
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FIG. 5: (Color online) Eigenvalues of H, trajectory weights, 
and reconstructed free-energy surface for solvated alanine 
dipeptide system. (A) The eigenvalues of H for solvated ala- 
nine dipeptide system with different temperatures and force 
fields. '300K mod' labels the results of 300K simulation with 
modified potential. (B) The calculated trajectory weights for 
simulations at 450K (left panel) and 300K (right panel) with 
standard force field. The horizontal axis denotes the (p angle 
of the initial conformations of trajectories. The trajectories 
started from C^', ur and C^^ regions are plotted as (red) 
upward triangles, (blue) downward triangles and (black) cir- 
cles, respectively. (C) The free energy curve along axis con- 
structed by the weighted trajectories simulated with modified 
potential at 300K. The curve labeled with (black) upward tri- 
angles is obtained with all the 488 trajectories in C!^''(224: in 
488) and Qij(264 in 488); the one labeled with (red) downward 
triangles is constructed with half of the trajectories started 
from Or omitted. 
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FIG. 6: (Color online) Projection map of solvated alanine 
dipeptide system simulated at 300K with modified force field. 
In the left panel, the projection of {G'} to the second and 
third eigenvectors of H are shown. The points (red circles) 
representing the non-transition trajectories in the three free- 
energy wells of Cy', ur and C^^ axe gathering, respectively, 
inside the three regions enclosed by dashed-edge squares. The 
correspondence between states and regions is labeled in the 
graph. The other points (cyan circles) represent the transi- 
tion trajectories between states. See the main text for more 
details. In the right panel, the histogram of L3 value for 
trajectories in Cy' and ur is shown to illustrate the concen- 
tration of trajectories at the extremal values of L3. 




FIG. 7: (Color online) Trajectory identification and short- 
ening for locating transition states in the system of solvated 
alanine dipeptide. (A) I/3 calculated with truncated trajec- 
tories (at 0.8r) versus those with full trajectories for the sim- 
ulations with modified potential at T = 300K. Only data 
for trajectories which started from aa and Cj'^ are shown. 
The non-transition trajectories in an and C^'^ are shown as 
upward and downward triangles, respectively, and the tran- 
sition trajectories started from the two states are plotted as 
(blue) circles and (red) squares, respectively. The straight 
lines are plotted by hand, and the arrows indicate the in- 
tersections between dash-dotted horizontal lines and dashed 
inclined lines. (B) The predicted transition time, tp, for the 
supposed single-transition trajectories versus their real tran- 
sition time, tr- The tp = tr relation is plotted as (red) dashed 
line for comparison. (C) The error of the predicted transition 
time versus the real transition time. (D) Illustration of transi- 
tion paths. Three 6ps-length segments of trajectories around 
their predicted transition time are shown with different sym- 
bols. The background is the free energy surface reconstructed 
from 300K simulation trajectories with modified potential. 



